AITopics | Natrona County

Collaborating Authors

Natrona County

PhD Knowledge Not Required: A Reasoning Challenge for Large Language Models

Anderson, Carolyn Jane, Biswas, Joydeep, Boruch-Gruszecki, Aleksander, Cassano, Federico, Feldman, Molly Q, Guha, Arjun, Lucchetti, Francesca, Wu, Zixuan

arXiv.org Artificial IntelligenceFeb-6-2025

Existing benchmarks for frontier models often test specialized, ``PhD-level'' knowledge that is difficult for non-experts to grasp. In contrast, we present a benchmark based on the NPR Sunday Puzzle Challenge that requires only general knowledge. Our benchmark is challenging for both humans and models, however correct solutions are easy to verify, and models' mistakes are easy to spot. Our work reveals capability gaps that are not evident in existing benchmarks: OpenAI o1 significantly outperforms other reasoning models that are on par on benchmarks that test specialized knowledge. Furthermore, our analysis of reasoning outputs uncovers new kinds of failures. DeepSeek R1, for instance, often concedes with ``I give up'' before providing an answer that it knows is wrong. R1 can also be remarkably ``uncertain'' in its output and in rare cases, it does not ``finish thinking,'' which suggests the need for an inference-time technique to ``wrap up'' before the context window limit is reached. We also quantify the effectiveness of reasoning longer with R1 and Gemini Thinking to identify the point beyond which more reasoning is unlikely to improve accuracy on our benchmark.

benchmark, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2502.01584

Country:

South America (0.04)
Oceania > Australia (0.04)
North America > United States > Wyoming > Natrona County > Casper (0.04)
(10 more...)

Genre: Research Report (0.65)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

MINION: a Large-Scale and Diverse Dataset for Multilingual Event Detection

Veyseh, Amir Pouran Ben, Van Nguyen, Minh, Dernoncourt, Franck, Nguyen, Thien Huu

arXiv.org Artificial IntelligenceNov-17-2022

Event Detection (ED) is the task of identifying and classifying trigger words of event mentions in text. Despite considerable research efforts in recent years for English text, the task of ED in other languages has been significantly less explored. Switching to non-English languages, important research questions for ED include how well existing ED models perform on different languages, how challenging ED is in other languages, and how well ED knowledge and annotation can be transferred across languages. To answer those questions, it is crucial to obtain multilingual ED datasets that provide consistent event annotation for multiple languages. There exist some multilingual ED datasets; however, they tend to cover a handful of languages and mainly focus on popular ones. Many languages are not covered in existing multilingual ED datasets. In addition, the current datasets are often small and not accessible to the public. To overcome those shortcomings, we introduce a new large-scale multilingual dataset for ED (called MINION) that consistently annotates events for 8 different languages; 5 of them have not been supported by existing multilingual datasets. We also perform extensive experiments and analysis to demonstrate the challenges and transferability of ED across languages in MINION that in all call for more research effort in this area.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2211.05958

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Oregon > Lane County > Eugene (0.14)
North America > Dominican Republic (0.04)
(16 more...)

Genre: Research Report > New Finding (0.66)

Industry:

Government > Military (1.00)
Government > Regional Government > North America Government > United States Government (0.93)
Transportation (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Prediction of Construction Cost for Field Canals Improvement Projects in Egypt

Elmousalami, Haytham H.

arXiv.org Artificial IntelligenceMay-19-2019

Field canals improvement projects (FCIPs) are one of the ambitious projects constructed to save fresh water. To finance this project, Conceptual cost models are important to accurately predict preliminary costs at the early stages of the project. The first step is to develop a conceptual cost model to identify key cost drivers affecting the project. Therefore, input variables selection remains an important part of model development, as the poor variables selection can decrease model precision. The study discovered the most important drivers of FCIPs based on a qualitative approach and a quantitative approach. Subsequently, the study has developed a parametric cost model based on machine learning methods such as regression methods, artificial neural networks, fuzzy model and case-based reasoning.

analytic hierarchy process, ground transportation, neural network, (25 more...)

arXiv.org Artificial Intelligence

1905.11804

Country:

Africa > Middle East > Egypt (0.64)
North America > United States > Wyoming > Natrona County (0.60)
North America > United States > Nebraska > Scotts Bluff County (0.60)
(13 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Questionnaire & Opinion Survey (1.00)
(2 more...)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (1.00)
Energy > Oil & Gas (1.00)
(5 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)
(5 more...)

Add feedback

Janice: Excited for eclipse

FOX NewsAug-19-2017, 20:22:37 GMT

I was 8-years-old and remember being both terrified and intrigued about something that was being talked about everywhere. This wasn't a storyline out of a science fiction movie or novel, this was real, and happening here on Earth. Millions of people were going to witness something that maybe happens a couple of times in our lifetime: A total solar eclipse. Our teachers were planning lessons about this incredible celestial event. Chalkboard diagrams, planetary mobiles and handmade viewing devices were being created out of shoe boxes.

artificial intelligence, eclipse, science fiction, (11 more...)

FOX News

Country:

North America > United States > Missouri > Jackson County > Kansas City (0.15)
North America > United States > South Carolina > Greenville County > Greenville (0.06)
North America > United States > Wyoming > Natrona County > Casper (0.05)
(6 more...)

Industry: Media > Film (0.50)

Technology: Information Technology > Artificial Intelligence > Science Fiction (0.50)

Add feedback